Influence of accurate compound noun splitting on bilingual vocabulary extraction
نویسنده
چکیده
The influence of compound noun splitting on a German-Polish bilingual vocabulary extraction task is investigated. To accomplish this, several unsupervised methods for increasingly accurate compound noun splitting are introduced. Bilingual evidence from a parallel German-Polish corpus and co-occurrence counts from the web are used to disambiguate compound noun analyses directly. These collected splits serve as training data for a probabilistic model that abstracts away from the errors made by the direct methods and reaches an f-measure of 95.10%. Furthermore, these methods are evaluated in terms of word alignment quality and extraction accuracy where linguistically accurate methods are found to outperform the corpus-based methods proposed in the literature. A comparison of alignment quality achieved with the best splitting method and the baseline implies that the effort to build supervised splitting methods might result in minimal or no performance gains.
منابع مشابه
Splitting Noun Compounds via Monolingual and Bilingual Paraphrasing: A Study on Japanese Katakana Words
Word boundaries within noun compounds are not marked by white spaces in a number of languages, unlike in English, and it is beneficial for various NLP applications to split such noun compounds. In the case of Japanese, noun compounds made up of katakana words (i.e., transliterated foreign words) are particularly difficult to split, because katakana words are highly productive and are often outo...
متن کاملNoun versus Verb Bias in Mandarin- English Bilingual Pre-School Children
This study investigated the presence of noun or verb bias in 15 MandarinEnglish bilingual pre-school children. The naturalistic bilingual child-caregiver interactions were tape-recorded for 30 minutes each time. The study also addressed the relationship between children‟s language production and the salient positions of the caregivers‟ language input. The findings show that the bilingual childr...
متن کاملMorphological deficits related to noun in Kurdish- Persian bilingual Broca’s aphasic patients
Introduction: One of the major issues in the field of linguistics that has attracted the attention of neurologists, linguists and psychologists of language is bilingual aphasia. this study has been done to investigate the morphological deficits of nouns (singular, plural, and collective) in Kurdish-Persian bilingual Broca’s aphasic patients. Materials and Methods: The research method was descri...
متن کاملLanguage-specific noun bias: evidence from bilingual children.
Most evidence concerning cross-linguistic variation in noun bias, the preponderance of nouns in early expressive lexicons (Gentner, 1982), has come from comparisons of monolingual children acquiring different languages. Such designs are susceptible to a number of potential confounders, including group differences in developmental level and sociodemographic characteristics. The aim of this study...
متن کاملA System for Compound Noun Multiword Expression Extraction for Hindi
Compound noun multiword expressions are important for many NLP applications like machine translation and information retrieval. This paper describes a system for Hindi compound noun multiword expressions (MWE) extraction from a given corpus. We identify major categories of compound noun MWEs, based on linguistic and psycholinguistic principles. Our extraction methods use various statistical co-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008